SeqNLS: Nuclear Localization Signal Prediction Based on Frequent Pattern Mining and Linear Motif Scoring
نویسندگان
چکیده
Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking. The experiment results on the newly curated Yeast and Hybrid datasets show that SeqNLS is effective in detecting potential NLSs. The performance comparison between SeqNLS with and without the linear motif scoring shows that linear motif features are highly complementary to sequence features in discerning NLSs. For the two independent datasets, our SeqNLS not only can consistently find over 50% of NLSs with prediction precision of at least 0.7, but also outperforms other state-of-the-art NLS prediction methods in terms of F1 score or prediction precision with similar or higher recall rates. The web server of the SeqNLS algorithm is available at http://mleg.cse.sc.edu/seqNLS.
منابع مشابه
Personal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)
Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...
متن کاملSystematic identification of cell cycle-dependent yeast nucleocytoplasmic shuttling proteins by prediction of composite motifs.
The cell cycle-dependent nucleocytoplasmic transport of proteins is predominantly regulated by CDK kinase activities; however, it is currently difficult to predict the proteins thus regulated, largely because of the low prediction efficiency of the motifs involved. Here, we report the successful prediction of CDK1-regulated nucleocytoplasmic shuttling proteins using a prediction system for nucl...
متن کاملDLocalMotif: a discriminative approach for discovering local motifs in protein sequences
MOTIVATION Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not cont...
متن کاملSUMO Substrates and Sites Prediction Combining Pattern Recognition and Phylogenetic Conservation
Motivation: Small Ubiquitin-related modifier (SUMO) proteins are widely expressed in eukaryotic cells, which are reversibly coupled to their substrates by motif recognition, called sumoylation. Two interesting questions are 1) how many potential SUMO substrates may be included in mammalian proteomes, such as human and mouse, 2) and given a SUMO substrate, can we recognize its sumoylation sites?...
متن کاملPrediction of nuclear export signals using weighted regular expressions (Wregex)
MOTIVATION Leucine-rich nuclear export signals (NESs) are short amino acid motifs that mediate binding of cargo proteins to the nuclear export receptor CRM1, and thus contribute to regulate the localization and function of many cellular proteins. Computational prediction of NES motifs is of great interest, but remains a significant challenge. RESULTS We have developed a novel approach for ami...
متن کامل